Unsupervised feature selection for sparse data

نویسندگان

  • Artur J. Ferreira
  • Mário A. T. Figueiredo
چکیده

Feature selection is a well-known problem in machine learning and pattern recognition. Many high-dimensional datasets are sparse, that is, many features have zero value. In some cases, we do not known the class label for some (or even all) patterns in the dataset, leading us to semi-supervised or unsupervised learning problems. For instance, in text classification with the bag-of-words (BoW) representations, there is usually a large number of features, many of which may be irrelevant (or even detrimental) for categorization tasks. In this paper, we propose one efficient unsupervised feature selection technique for sparse data, suitable for both standard floating point and binary features. The experimental results on standard datasets show that the proposed method yields efficient feature selection, reducing the number of features while simultaneously improving the classification accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...

متن کامل

Embedded Unsupervised Feature Selection

Sparse learning has been proven to be a powerful technique in supervised feature selection, which allows to embed feature selection into the classification (or regression) problem. In recent years, increasing attention has been on applying spare learning in unsupervised feature selection. Due to the lack of label information, the vast majority of these algorithms usually generate cluster labels...

متن کامل

Automatically Redundant Features Removal for Unsupervised Feature Selection via Sparse Feature Graph

The redundant features existing in high dimensional datasets always affect the performance of learning and mining algorithms. How to detect and remove them is an important research topic in machine learning and data mining research. In this paper, we propose a graph based approach to find and remove those redundant features automatically for high dimensional data. Based on the framework of spar...

متن کامل

Nonparametric Redundant Features Removal for Unsupervised Feature Selection: An Empirical Study based on Sparse Feature Graph

It is known that the redundant features existing in high dimensional datasets will affect the performance of learning and mining algorithms. How to detect and remove them is a challenge in machine learning and data mining research. In this work, we propose a graph based approach to search and remove those redundant features automatically for high dimensional data. A sparse graph is generated at...

متن کامل

Adaptive Unsupervised Multi-view Feature Selection for Visual Concept Recognition

To reveal and leverage the correlated and complemental information between different views, a great amount of multi-view learning algorithms have been proposed in recent years. However, unsupervised feature selection in multiview learning is still a challenge due to lack of data labels that could be utilized to select the discriminative features. Moreover, most of the traditional feature select...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011